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1    Introduction 

Much  of  the  career  of  Herbert  Solomon  has  been  devoted  to  combining  the 
identification  of  important  applications,  the  building  of  models  for  these  applications 
and  the  use  of  data  to  estimate  the  parameters  of  the  models  or  to  develop  new 
models.  His  activities  have  covered  the  vast  territory  of  statistics,  operations 
research  and  the  social  sciences.  He  has  considered  problems  arising  in  such  diverse 
areas  as  logistics,  inventories  and  queueing,  quality  control  and  inspection,  learning 
and  human  factors,  packing  and  geometrical  probability.  It  is  in  this  eclectic  spirit 
that  we  consider  a  problem  area  involving  both  probability  modelling  and  statistical 
inference. 

This  paper  presents  a  natural  generalization  and  a  statistical  analysis  for  a  general 
and  important  ciass  of  stochastic  processes,  simple  Markov  population  processes 
(SMPP).  This  class  of  processes  is  broad  enough  to  encompass  simple  queues  and 
complex  queueing  networks,  repair  models,  manpower  models,  labor  mobility  and 
migration  phenomena,  and  many  others.  The  equilibrium  theory  for  the  SMPP  is  well 
known.  The  reader  should  consult  Kingman  (1969)  or  Kelly  (1979)  for  a  thorough 
treatment  of  the  probabilistic  aspects  of  this  theory.  We  will  focus  on  statistical 
inferences  associated  with  random  parameter   versions  of   the   SMPP. 

Traditional  formulations  of  applied  probability  models  typically  emphasize  only  one 
of  the  many  possible  sources  of  random  fluctuations  that  may  influence  system 
benavior.  For  example,  extensive  treatment  has  been  given  to  a  wide  variety  of 
queueing  modeis  for  congestion  at  service  facilities.  in  these  models,  some  simple 
arrival  or  demand  process  confronts  a  given  service  process  often  presumed  to  have 
i.i.d.  random  variable,  or  possibly  Markovian,  character.  The  parameters  of  the 
component  arrival  and  service  processes  are  nearly  always  assumed  to  be  few  in 
number,  given  and  fixed.  Consequents,  the  predicted  waiting  times  and  queue 
lengths  vary  oniy  m  response  to  the  ,nnerent  variability  of  the  arrival  and  service 
processes  if  tne  parameters  of  the  latter  are  also  fixed  and  known.  It  may  be 
argued  that  sucn  internal  or  within  variability  is  not  tne  only  source  to  be  considered 
under  many  circumstances:  the  fixed  rates  may  oe  expected  to  change  occasionally, 
and  additionally  may  well  be  unknown.  Similarly,  inventory  control  models  typically 
assume  tnat  parameters  of  demand  distributions  a-e  fixed,  as  do  reliability- 
redundancy  studies  of  fauure-prone  repairable  systems,  and  the  compartment  models 
of  pharmacology. 

Perhaps  for  reasons  of  mathematical  tractamniy,  tnere  has  been  far  less  attention 
paid  to  models  having  time-dependent  parameters,  wnere  the  time  dependence  is 
either  deterministic  or  "random.'  Recently,  nowever,  models  representing  random 
environments  or  doubK  stocnasuc  effects  nave  appeared  in  the  literature.  Such 
models   should   be   useful    in   system   sudies   for    descnD  ng   realistic   situations   m   which 


parameter  values  vary  widely  because  of  weather  or  other  natural  environmental 
influences  or  because  of  the  impact  of  client,  patient,  or  opponent  action,  or  various 
other  exogenous  influences.  Thus,  the  traditional  models  should  be  extended  by 
incorporating  external  or  between  variability  components.  Because  it  is  often  natural 
to  suppose  first  that  the  basic  parameters  of  the  model  of  interest  are  drawn  from  a 
given  (with  unknown  parameters)  superpopulation,  we  refer  to  such  models  as 
hierarchical. 

The  objective  of  this  paper  is  to  formulate  and  solve  problems  of  statistical 
inference  for  hierarchical  models  which  arise  in  the  context  of  Markov  population 
processes.  We  will  assume  that  we  are  given  an  observation  of  the  sample  path 
from  a  set  of  Markov  population  processes.  Each  of  the  processes  is  governed  by 
its  own  set  of  parameters.  The  vectors  of  parameters  for  each  of  the  processes  are 
assumed  to  be  drawn  i.i.d.  from  a  parent  "superpopulation"  distribution  which  may 
itself  have  unknown  parameters.  The  goal  is  to  estimate  the  parameters  of  the 
superpopulation,  because  these  describe  the  population.  Using  this  information,  we 
wish  to  estimate  the  parameters  of  each  of  the  Markov  population  processes  to 
make  inferences  about  a  particular  instance  of  the  system,  population  or 
compartment.  If  positive  indications  are  present,  it  will  be  possible  to  pool 
information  from  other  processes  to  improve  the  estimates  of  one  in  particular,  i.e., 
to  "borrow  strength"  in  John  Tukey's  words.  On  the  other  hand,  we  seek  approaches 
which  are  "discrepancy-tolerant"  or  "robust,"  i.e.,  which  do  not  unjustifiably  pool  data 
when  counter-indications  are  presented  by  the  data;  Gaver  (1985). 

The  general  inference  problem  described  seems  to  fall  into  the  category  of  a 
standard  Bayesian  analysis  or,  more  ambitiously,  an  empirical  Bayes  or  Bayes- 
empincal  Bayes  analysis.  The  reader  might  consult  Morris  (1983)  for  a  review  of 
parametric  empirical  Bayes  methods,  Robbins  (1983)  and  in  much  previous  work,  has 
elucidated  the  non-parametric  Bayes  approach,  or  Deely  and  Lindley  (1981). 
Surprisingly,  scant  attention  seems  to  have  been  paid  to  empirical  Bayes  methods  for 
stochastic  processes. 

This  paper  is  organized  as  follows.  We  introduce  the  notion  of  Markov  population 
processes  in  Section  2.  Section  3  gives  several  examples  of  situations  in  which 
random  parameter  versions  of  Markov  population  processes  are  important.  Section  4 
develops  a  likelihood  function  and  Bayesian  statistical  inference  for  these  processes. 
Section  5  briefly  discusses  empirical  Bayes  approaches.  Section  6  points  out 
important   topics   for   future  research,   much   of   which   is  currently   in  progress. 


2   Markov   Population  Processes 

We    are    concerned    with    Markov    population    processes    in    n    dimensions.       Such    a 
process     {X(t).t    £    0}     is    a    continuous    time    Markov    process    with    state    space    S    = 

{(ir'2 'n)'    ' j *  Z"*" , "!    £   j    £    n}.      The   components   of    the   state   space   correspond   to   the 

occupancy    of    each    of    the    n    populations    or    colonies.       The    possible    transitions    are 

defined     as     follows.         Let     7V(g)     =     T.(si sn)     =     (s,...,Sj_  ,,8,-1,8,+ , s +1 sn).        The 

transition  from  state  s  to  T  s  corresponds  to  a  migration  of  an  individual  or  unit  in 
population  (colony)  i  to  population  j  .  For  a  simple  Markov  population  process  the 
flow  rate  for  such  transitions  can  depend  on  s  only  through  s  .  We  specialize  this 
requirement  further  by  requiring  the  flow  rate  to  have  the  form  X.f  .(s)  .  1  £  i,j  £  n, 
i  *  J  • 

We    also    allow    transitions    from    the    outside    into    one    of    the    colonies.       Let    T    (s) 

oj  - 

correspond    to    an    arrival    from    the    outside    to    colony   j.   This    transition    has    flow   rate 

*Qf    (s  )   and   depends    on    s    only   through   s  .      Finally,    there    can   be    departures    from    a 

colony    to    the    outside.      We    define   T    (s)   =    (s , s-1 s  ).      The    transition   from    s    to 

io  ~  1  i  n  - 

T    (s)  has   flow  rate    Xf    (s ). 

IO   -  I     10       i 

The  functions  {f(k),  O^i^n,  O^j^n}  can  often,  but  not  always,  be  thought  of  as 
known  structural  parameters.  If  one  is  studying  a  Markov  population  process  such  as 
a  queueing  network  or  compartmental  model,  these  parameters  are  determined 
directly  from  knowledge  of  the  structure  (e.g.,  the  compartment  connectivity  or  the 
number  of  servers  at  a  node  and  the  flow  of  traffic  among  the  nodes).  The  rate 
parameters  X  =  ( X  A  1 ,..., A ^)  are  generally  not  known  and  must  be  estimated  from 
data.  Several  comments  are  in  order.  First,  a  single  unknown  input  parameter  may 
not  be  sufficient,  as  there  could  be  flows  into  different  colonies  whose  relative 
magnitudes  are  not  known.  The  formulation  can  be  easily  extended  to  allow  for  as 
many  as  n  such  parameters  (one  for  each  colony),  but  we  do  not  do  so  here. 
Second,  one  must  introduce  conditions  on  X  and  tne  structural  parameters  jointly  to 
ensure  tnat  an  equilibrium  distribution  exists.  Indeed,  if  an  equilibrium  distribution 
were  to  exist,  it  would  be  of  product  form;  see  Kelly  (1979).  Fortunately,  the 
existence  of  an  equilibrium  is  unnecessary  for  our  analysis.  Tnere  is  no  need  for  tne 
process  to  be  in  equilibrium  or  to  even  have  an  equilibrium  distribution.  We  merely 
observe  a  part  of  the  sample  path  and  then  develop  estimates  even  if  the  process  is 
transient.  Similarly,  it  is  usual  to  assume  the  state  space  is  irreducible,  but  this  is 
not  necessary  for  our  purposes.  For  this  reason,  we  define  S  =  {(i  r...,in),  i.'Z+.  1  £ 
j   £   n}    and   do   not   address   the   possible   boundedness   of   certain  components. 


3  Examples  of   Simple  Markov  Population  Processes 

We  present  several  examples  to  illustrate  a  few  of  the  important  applications  of 
Markov  population  processes.  We  also  describe  the  accompanying  inference 
problems. 

3.1  Simple  Markovian  Queueing  Systems:      M/M/C 

In  this  example,  we  let  n  =  1.  The  state  variable,  x,  corresponds  to  the  number  of 
customers  in  or  awaiting  service.  The  parameter  Xq  corresponds  to  the  arrival  rate, 
while  X1  is  the  service  rate.  The  structural  parameters  are  given  by  fo1(x)  =  1  and 
f,   (x)  =   mm  (x,C)  where  C   is  tne  number  of   servers.      It    is   likely   that   either    X     or    X 

i  O  '  O  1 

or  both  are  unknown  and  must  be  estimated  from  data.  Not  only  may  X  be 
unknown,  but  it  can  also  exhibit  fluctuations  over  time,  e.g.,  from  day  to  day.  This 
is  especially  true  in  service  systems  where  demands  vary.  It  is  reasonable  to  gather 
data  over  relatively  short  periods  of  time  during  which  X  and  X|  can  be  considered 
to  be  constant.  By  introducing  a  superpopulation  distribution  over  the  possible 
<Xc,X,)  pairs,  one  can  use  these  sample  path  fragments  to  estimate  the 
superpopulation  parameters  and  the  particular  (X  .X^  realizations.  Finally,  the 
superpopulation  distribution  can  be  used  to  carry  out  a  complete  system  performance 
analysis. 

3.2  Compartment  Models  in  Pharmacology 

Compartment  models  offer  a  broad  class  of  models  often  used  to  represent  the 
movement  of  drugs  or  pollutants  through  a  system  such  as  tne  human  body.  The 
compartments  correspond  to  pools  or  tissue  groups  such  as  the  blood  stream  or 
body  organs.  Stocnastic  compartment  models  are  often  equivalent  to  an  open  or 
closed  Jackson  network  of  infinite  server  queues.  The  compartmental  structure  is 
usually    defined    by    an    n    x    n    transition    matrix    P    =    (p(  )    witn    p(     £    0,    X  p      ^    1    and 

p„   =   Zp  .      Tne   structure    functions   are   taken   to   be   f  (x )   =   p  x ,   0   i   j    i   n.   If    X      =   0 

"i o  j    "i j  ijrrij:-a,,a  o 

and  p  =  0,  1  i  i  i  n,  tnen  the  system  is  closed.  Otherwise,  it  is  open.  We  assume 
P  is  known,  but  the  analysis  can  be  carried  out  if  we  replace  A  p  x  by  X  x  where 
X     =   X    p     must  be  estimated. 

it  has  oeen  commented  on  by  Koch-Weser,  as  quoted  by  Wagner  (1975),  that  "drug 
dosages  needed  for  optimal  therapeutic  effects  differ  widely  among  patients.  The 
'usual  dose'  of  most  potent  drugs  accomplishes  little  in  some  persons,  causes 
serious  toxicity  in  otners,  and  is  fully  satisfactory  in  a  few."  This  observed 
variability  between  patients  strongly  indicates  that  tne  model  rate  parameters  X  = 
(X  ,X1f...fX  )  correspondingly  exhibit  substantial  variability.  We  imagine  that  each 
individual  draws  a  X  from  a  superpopulation  distribution,  and  observe  a  sample  path 
from   each   of   the   compartment   processes.     One   goal   is  to  estimate  the  parameters   of 


the  superpopulation  distribution,  because  this  determines  the  variation  between 
members  of  the  population.  In  addition,  it  is  important  to  estimate  each  of  the 
individual  X  values,  since  they  may  be  related  to  patient  pathology  or  classification. 
Knowledge  of  the  between  variability  may  be  used  to  strengthen  estimates  of 
individual    X  values. 

3.3  Logistic  Support  for  a  System  Depending  Upon  Repairable  Modules 

Successful  operation  of  each  of  a  set  of  I  vehicle  systems  (e.g.,  trucks,  rental  cars, 
airplanes,  or  ships)  depends  upon  the  operability  of  important  subsystems  or  modules 
(tires,  engines,  communication  and  navigation  subsystems).  Suppose  modules  are 
failure  prone  but  repairable,  and  that  each  module  type  that  is  on  a  vehjcle  in 
operation  fails  independently  at  an  (unknown)  Markovian  rate  X  ,  where  j  refers  to  a 
module  of  type  j,  "Kj^J,  and  i  refers  to  the  ilh  vehicle,  "\£\£\.  Let  the  Markovian 
repair  rate  for  modules  of  type  j  be  /j  (unknown),  meaning  that  repair  times  are 
independently  exponentially  distributed.  Suppose  that,  in  addition,  there  are  M 
modules  of  type  j  on  each  vehicle,  and  that  the  more  that  are  up  or  operable,  the 
more  effective  is  the  overall  system  operation.  In  addition  there  are  S  spare 
modules  of  type  j  in  the  system,  and  a  repair  system  that  contains  R  service 
facilities  (repair  teams,  equipment;  spare  parts  and  tools  are  considered  separately). 
Let  X  (t)  represent  the  number  of  type  j  units  up  and  either  installed  on,  or  awaiting 
installation  on,  all  systems  1£i£l  at  time  t.  Then  under  additional  stipulations 
concerning  the  service  protocol,  {X  (t),  t^O}  is  a  multivariate  birth  and  death  process. 
The    total    number    of    modules    in    the    system    is    I       (M     +    S  ),    the    total    number    of 

J  j£    1  J  J 

service   facilities   is  X      R  ,   and  the  relevant   problem   is   to  specify   near-optimal   values 

i=  i      J 
of   S     and   R     when   data   are   available   to   estimate   the   rates    X   when    u     and   a   suitable 

J  j  ~J 

measure   of   effectiveness   are   specified. 

4  Statistical   inference  for   Markov   Population  Processes 

We  first  develop  the  likelihood  function  for  simple  Markov  population  processes. 
Later   in  this   section  we   consider  Bayesian   inference. 

4.1    The   Likelihood  Function 

A  Markov  population  process  behaves  in  a  simple  fashion.  Suppose  at  time  0  it  is 
in  state  s.  It  will  remain  in  state  s  for  an  exponential  period  of  time  with  rate 
parameter 


R(s)  =    X   I  f      (s  )  -  I    X     I    f  (s  )  =    X    f    (s)  -  I    X  f  (s ) 

o(_  1  Ol         I  1=1      '     i"°    'J  o    c    -  |=  •      I    I.      I 

n  r 

where  f    (s)  =  Z  f      (s)  and  f  (s  •  =     I      f  is  ). 

°  ~        i=  i  oi    -  J  =  c     -. 


Of   course,   R(s)  is  the  rate  at  which  the  process   leaves   state  s. 

When    this    exponential    period    ends,    the    process    moves    to    a    new    state.       These 
transitions  are  specified  for    1   £   i   i  n  and  0  i  j   i  n  by 

T    (s)       with  probability       X   f    /R(s), 
T  (§)       with  probability       X(f  (s)/R(s). 


The  data  from  the   sample  path  can  be  reduced  to  a  sequence   of   states   and  sojourn 
times   in  those  states.     This   can  be  written  as  (s(1,,S,),(s(2,,S0) (s(m>,   S    )  where  s<l)  is 

-  l      -  z.  —  m  — 

the  tth  state  occupied  by  the  process.  To  remove  certain  minor  difficulties,  we  will 
assume  that  s<1)  is  not  random  but  is  deterministic.  It  is  possible  to  assume  that 
the  initial  state  is  stochastic  and  is  chosen  by  some  distribution  (usually  the 
equilibrium  distribution,  if  one  exists).  We  do  not  even  assume  the  existence  of  an 
equilibrium  and  so  simplify  matters  by  taking  s<1>  to  be  deterministic.  We  have  not 
described  the  sampling  interval  [0,T]  and  prefer  to  leave  it  unspecified.  T  may  be 
deterministic,  in  which  case  the  number  of  transitions  (m-1)  is  stochastic. 
Alternatively,    one    could    observe    the    process    until    m-1    transitions   have   occurred,    in 

m 

which    case    T    =    Z       S,    is    stochastic.       The    likelihood,    however,    would    not    differ 

t=1       l 
substantially. 

Thus,  the  sojourn  times,   S, S      contribute  a  factor  of 


n      R(s(t')    n      exp(-StR(s(t>» 


to  the  likelihood  function  for  given  X  .  The  state  transitions  also  contribute  to  the 
likelihood  function.  If  the  transition  from  s(k)  to  s<k+1)  involves  a  departure  from 
colony  i  to  colony  j,  O^j^n,  a  term  Xf  (s(k))/R(s(k))  is  multiplied  into  the  likelihood 
function.     A   term    X    f    (s  <k))/R(s(k))   is   included   if   an   individual   arrives  to   colony   i    from 

O    Ol      I  - 

the  outside.  We  let  M  =  the  total  number  of  arrivals  from  the  outside,  and  M;  =  the 
total     number    of    departures    from    colony    j  As    a    consequence,    the     likelihood 

function   is  proportional   to 

m  n  M 

exp(-  Z    S,R(s(t)»    n     XL  k.  (4.1) 

t=  1     '      "  k=o    K 


The  quantity 

I    S,R(s(t))  =  I  S,  [l    A  f  (S(l,)l   ■  £    a  Z    S,f  (s(t)»  =  £    X  W 


where         W    =  I    S,f  (s(,)) 

♦  —  i        I     I.    *" 


The    sufficient     statistics    are    given    by    (M  ,    W  )    where    M   =    (M  M  )T    and    yv  = 

(Wo Wn)T.     The   likelihood  function   is  of  the  multivariate   independent   Poisson  type: 


M 


exp(-Z    X  Win    X      j  (4.2) 


j  =  o    i      i  j  =  o    J 


4.2  Random  Parameter  Processes 

The  likelihood  given  in  equation  (4.2)  is  relevant  to  a  single  version  or  realization  of 
a   simple   Markov   population   process   (SMPP).      In   many   circumstances,    it    is   reasonable 

to    suppose   that    the    rate    parameters    X    =    (X   Xn)T    occasionally    change,    for    example 

in  response  to  external  events.  Assume  we  are  able  to  observe  K  such  different 
versions  and  wish  to  estimate  the  individual  parameters  on  the  supposition  that 
important  between-version  variability  may  exist.  The  simplest  plausible  way  to 
proceed  is  to  introduce  a  superpopulation  of  parameters  X  having  density  f(X|g) 
where  $  denotes  a  hyperparameter  vector.  The  K  observed  sample  paths  are  then 
analyzed  as  if  the  parameters  of  each  version  were  selected  at  random  using  the 
density  f.  Specifically,  let  X(1),...,XtK>  be  u.d.  random  vectors  with  density  f(X|g). 
Here  X(k)  corresponds  to  the  rate  parameters  for  the  kth  observed  SMPP,  {Xk(t), 
0<t<Tk}. 

The  assumption  of  independent  sampling  allows  one  to  simply  construct  an  overall 
likelihood  that  incorporates  the  information  as  well  as  that  in  the  superpopulation 
density   f. 

4.3  Bayesian   Inference 

Simple  Bavesian  inference  assumes  the  superpopulation  density  f  to  be  completely 
specified.  in  the  case  of  a  superpopulation  density  f(X|£),  the  hyperparameters 
£  would  be  treated  as  known,  presumably  by  elicitation.  Estimation  of  X(k)  is 
accomplished  Dy  finding  the  posterior  density  of  X(k)  given  {Xk(t),  0^t<;Tk}  and  then 
computing  some  appropriate  estimator  such  as  the  posterior  mean  vector  or  the 
posterior   mode   vector.      In  this   section,  we  consider   three  particular   parametric   priors. 

4.3.1    Conjugate   Gamma   Prior 

Tne  easiest  situation  occurs  when  we  introduce  a  multivariate  gamma  prior 
distribution   with   independent   marginals.     Specifically,   we   assume 


f<Xj*)  =  n    AV.V^xp^V)  ir(a),    X     >   0.       0  i    j    i   n.  (4.3) 

j  =  o 


The   hyperparameters   are   specified   by    f  -   (a,§).      For   a   likelihood   function  given  by 
(4.2),  the  posterior  distribution  of   X  given   {X(t),  0  i  t   i   1}    is  given  by 

f(\\$,   X)  =?I    (^+Wj)flJ  +  MJX  cJ+MJ-ie-(/?J+wj)Xj/ru     M)#    X    >   0,       0  i  j   in  (4.4) 


The  posterior  distribution  has  independent  gamma  marginals.  It  is  simple  to  estimate 
any  of  the  individual  X  or  some  function  of  them.  The  posterior  mean  of  X  is 
given  by  (a  +M  )I{J3  +W ),  while  the  posterior  mode  is  given  by  (a  +M  -1)/(>?  +W ) 
provided  a  +M  \  1.  The  posterior  mode  is  seen  to  resemble  the  mean,  and  it  turns 
out  to  be  a  convenient  approximation.  We  assume  that  the  hyperparameters  a  = 
[a   ,...,a  )     and     0    -     (JS  B  )     are     known,     presumably     by     ehcitation     and     relevant 

on  *-  'on  r  '  ' 

experience. 

4.3.2   Multivariate  Log-Normal  Prior 

The  previous   choice   of   conjugate   prior   offers   considerable   mathematical   tractability; 

however,     it    does    not    permit    the     X    parameters    to    be    correlated.       Furthermore,    a 

common     choice     of     prior     for     univariate     Poisson     process     data     in     reliability     and 

probabilistic    risk    assessment    studies    is    a    log-normal    distribution,    see    Rasmussen 

(1975),   Hill,   Heger    and   Koen  (1984),   or   Kaplan  (1983).      We   introduce    *     =    logX     and   let 

*  =.(«  ,...,<  )T  -  N(  B,Z)  where  §  is  an  n+1  dimensional  mean  vector  and  X  is  an  (n+1)  x 
—  ~o  n  *■  *- 

(n+1)  positive  definite  covanance  matrix.  This  formulation  provides  both  log-normal 
marginals  and  the  possibility  of   correlated  components   in  a  familiar   fashion. 

The    log-posterior    distribution    is    obtained    from    the    prior    and    equation    (4.2)    and    is 
given  by 

log   f(X!g,x)  =log  C   -  (<-//)TI"  Hi~#)i2   -   >T-W*    jfTM 


where  C   is  a  normalization  constant   and  W=  (W  W  )T  and  M  =  (M  ,...,M  )T 

~  on  -  on 

It   follows  that    log  f(X|g,   X)   is  given,   up  to  constants,   by 

log   f^i^X)   =   "(jf-^)1  I_1(£-^   -    XTW+   iTM.  (4.6) 


One   can   consider    finding   the   posterior   mode   by   differentiating   (4.6)  with   respect   to 
*.      The   first   derivative   of   (4.6)  with   respect   to    *    is 


-1-Hl-g)  +   M-  D1  (4.7) 

where    0    is    a    diagonal    matrix    with    entries    Wexp(<  ),    and    1    is    a    column    vector    of 
ones. 

The  second  derivative  of  (4.6)  with  respect  to   *    is  given  by 

-I"1    -  D  (4.8) 

The    matrix    -    (Z" 1     +    D)    is    clearly    negative    definite,    thus    one    can    find    the    unique 
posterior   mode  by   solving 

0  =   -  I~1(«-£)  +   M-  01.  (4.9) 

Equation  (4.9)  cannot  be  solved  in  closed  form,  but  a  numerical  solution  may  always 
be  obtained  by  a  Newton-Raphson  iteration  approach.  As  a  starting  solution,  one 
might  use  the  vector  of  logarithms  of  the  raw  rates,  i.e.,  ( (o)  =  log(M /W  ).  In  case 
small,  occasionally  zero,  M  values  occur,  one  might  replace  M  by  M  +0.5.  The 
Newton-Raphson  will  take  the  form 

£<£f1)  =   til)  -  (I-1+or1   [l-H*Jl)  -  M  -  M+  Dl)].  (4.10) 


If  the  suggested  starting  value    t (o)  =   log(M  ,'W  )  is  used,  the  first   iteration   leads  to 
,U)  =   £(o)  .  (I-''+o)-1I-1(i(o)-iu).  (4.111 


This  improved  estimate  of  *_  is  only  a  first  approximation  to  the  true  solution,  but  it 
has  a  familiar  form  and  intuitive  content,  namely  a  weighted  combination  of  the  raw 
rates  and  the  prior  mean.  As  the  variability  of  the  superpopulation  decreases,  more 
weight  is  put  on  the  superpopulation  mean.  jj.  Both  (4.10)  and  (4.11)  exhibit  the 
tendency  for  linear  snnnkage  of  the  raw  estimate  vector  toward  ^,  irrespective  of  the 
relation  of  the  log  observed  rates  to  the  superpopulation  center.  Such  linear 
shrinkage  behavior  also  characterizes  estimates  obtained  from  the  conjugate  gamma 
superpopulations.  it  appears  to  be  inappropriate  for  highly  discrepant  observations 
that    may    occur.      Some    observed    rates    may    actually    choose    not    to    confom     to    tne 
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relatively  short-tailed  gamma  and  log-normal  distributions,  even  if  the  parameters  of 
the  latter  are  obtained  from  expert  judgement.  Such  outliers  may  be  the  consequence 
of   recording  errors  or  they  may  be  the   manifestation  of  unsuspected   influences. 

To  obtain  information  about  the  joint  or  marginal  posterior  for  t  and  hence  for  X 
=  exp(<  ),  it  is  necessary  to  resort  to  numerical  integration.  It  has  been  found  that 
an  initial  Laplace's  method  approximation,  for  which  one  should  consult  Mosteller  and 
Wallace  (1964)  or  Kadane  and  Tierney  (1984),  followed  by  a  few-point  Gauss-Hermite 
integration,  can  be  quite  effective;  see  Gaver  (1985)  for  a  discussion  of  the  univariate 
simple  Poisson  case.  By  this  approach,  one  can  assess  the  sampling  error  of  the 
point  estimate  achieved  from  the  model  estimate  suggested  above.  One  can  also 
compute  alternative  estimators  such  as  the  perennial  favorite,  the  posterior  mean,  or 
a  weighted  version  thereof. 

4.3.3  Multivariate  Log- Student  t  Prior 

In  order  to  recognize  the  possible  existence  of  rates  of  greater  discrepancy  than 
admitted  by  a  gamma  or  log-normal  superpopulation,  we  introduce  the  multivariate 
log-Student  distribution.  This  superpopulation  has  longer  tails  than  the  normal,  and 
hence  should  yield  interesting  estimates  for  the  rates.  It  can  be  obtained  by  scale 
mixing  the  multivariate  log-normal.  A  formula  for  the  multivariate  t  appropriate  here 
is 

f<«!#A»  =   C      ,<Det(A)-1/2(1+QU))(r^'+k)/2  (4.12) 


where  Q   =   Q(f)  =   if^A    Hf^lk,  &    is   a   covanance   matrix,   k  is   a  degree  of   freedom 

parameter,    and    C    ,  .     is    a    normalization    constant.       See,    for  example,    Mardta,    et    al 
(1979),   page   57  and   associated  references  therein. 

in   order   to   determine   the   modes   of   the   posterior   distribution  we   proceed   as   before 

by    examining    the    log-posterior    for    X    or    equivalently    for    <.  By    omitting    irrelevant 
constants  we   find 

log   Hi\$.X)  =   -  (n+1+k)log(1+Q)/2  -   XTW  +    *T-M.  (4.13) 

The   first   derivative  of  (4.13)  with  respect   to    «    is  given  by 

-(n^1-k)A-lu-//)/(2k(l+Q))   -  D1    *    M.  (4.14) 


11 

We  define  X  =   2k(1+Q)^/(n+1+k)  and  rewrite  (4.14)  as 

-X-Hi-tf)  -  D1    +  M.  (4.15) 

It    should    be    noted    that    (4.15)    is    identical    to    (4.7)   which    arises    with    the    log-normal 
superpopulation  except  that   in  (4.15)  I  is  a  function  of    «   through  Q. 

The  second  derivative  of  (4.14)  can  be  written  as 

-  D  -  [l_1    -  I-1<£-ii)<:f-/rI-1/(n+1+k)].  (4.16) 

It   is  now  possible   for   there  to  be  multiple  solutions  of  the   likelihood  equations 

Q  =   -1-Hi-jj)  -  D1    +  M.  (4.17) 


and  there  can  be  more  than  one  local  maximum  in  the  posterior  distribution.  The  use 
of  a  posterior  mean  estimate  becomes  questionable.  It  will  minimize  a  mean  square 
error  criterion  but  will  tend  to  give  an  estimate  between  the  two  if  such  modes 
exist.       An    alternative    approach    is    to    take    a    data-based    initial    estimate    of    *,    * 


(O)       ; 

J 

log(M /W  ).     The    <}o)   is  used  to  compute  an   initial  Z(o).     Equation  (4.17)  is  replaced  by 


j      j 


O  =   -(I(o,r1(<-£)  -  D]    +  M.  (4.18) 


which  is  a  particular  case  of  (4.17)  and  wnich  has  a  unique  solution.  This  solution 
often  gives  useful  results  especially  when  dispersion  between  individual  rates  is 
large;  see  Gaver  (1985).  The  matrix  ^  has  been  weighted  by  an  initial  discrepancy 
factor  2k(l+Q)/(n+ 1+k).  When  this  factor  is  large,  then  the  shrinkage  toward  #  is 
reduced.  This  is  a  desired  effect  for  the  multivariate  log-Student  t  superpopulation 
whicn  has  heavy  tails.  Values  of  *_  far  from  the  mean  are  possible,  consequently  in 
such  cases  tne  procedure  refuses  to  "borrow  strength"  when  it  is  unwarrented.  Such 
a  procedure    is   "discrepancy-tolerant"   or   robust. 

5  Empirical   Bayes   Methods 

In  this  section,  we  discuss  the  empirical  Bayes  approach  to  these  problems.  We 
consider  the  previous  formulations  but  assume  that  the  superpopulation 
hyperparameters  are   not   known.     Thev   must   be   estimated   from   all   the  observations. 
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5.1   Conjugate  Gamma  Superpopulation 

In    the    previous    section,    we    found    that    the    conjugate    gamma    prior    offered    an 
especially      tractable      estimation      solution.  We       did,       however,       assume      the 

hyperparameters   were   known.     Now  we   consider   estimating  them   directly  from  the  K 
sample   paths.      This    is   done   by   finding  the   distribution   of   a   sample   path   conditional 

only   on  the   hyperparameters    a   and   §.   that    is   with    X   =  (XQ X^)   integrated  out.     This 

takes  on  a  multivariate   independent  negative  binomial   form, 

n  ■  a  +\a/ 

f  <M  .W  !  a,§  )  =  II      Ha  .+M.)fl*il(r{a.)(fi.)    '       ').  (5.1) 

i=o  ' 


We    now   compute    maximum    likelihood    estimators    of    a   and   §  given    the    data    from 
the  K  sample  paths,  (M  (1>,W(1)) (M  (K).W(K>). 

The   log-likelihood  function  for   a  and  §  is  given  by 

K       n 

Ka.ffj  -  Z    I  (H(a +M(k,)-H(a))^  log/fl-(c +M(k))log(^*W(k,>.  (5.2) 

k=1i  =  o  '  i  i  i        i        i  ii 


where  H(x)  =   log  T(x). 

The  expression   in  (5.2)  must   be  maximized  over    a  and  §. 

Let    us   assume    for    the    moment    that    a    is    treated    as   known,    hence   only   §  must   be 
estimated.     This   is  done  by   solving  the  likelihood  equations. 

0   =  1    (a  I  ft    -  (a  +MK)/(/?+Wk)),  0   i    i   i   n.  (5.3) 

k=i     '     '  i       i        i       i 


K  K 

We  assume  that   both  Z      Wk   and  Z      M1,  are  positive.     The  case   in  which  both  are  0 

k= i       '  k=  1 

can   be   handled   separately   in   a   straightforward   fashion.      It   can   arise   only  when  there 

is  no  activity   in  a  particular  colony  in  all  tne  K  sample  paths. 

Equation  (5.3)  cannot  De  solved  in  closed  form;  however,  it  does  have  a  unique 
solution  wmch  can  be  found  numerically.  This  can  be  seen  by  rewriting  (5.3)  for  a 
particular   i   as 

K 

aK   =   l     (a  +Mk)(y!?/(/?+Wk)).  (5.4) 


k=  1 


i     '   i    '   i         i 


The  right    side   is   monotone   increasing   in  p.      It   is   0   for   /#    =   0  and   increases   to   aK 
X     Mk   >   A  K   as   li    approaches   infinity,   hence  there   is   a  unique  root. 

k=  1       ' 
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When    a    must    also    be    estimated,    we    must    introduce    a    second    set    of    likelihood 
equations. 

K  K 

0  =  I      {H'U+M^-H'U)  -  I     log((A+WM/A).       0  i  j   in.  (5.5) 


Equations  (5.3)  and  (5.5)  must  be  solved  simultaneously  for  a  and  §.  This  problem 
is  the  multivariate  version  of  one  discussed  by  Deely  and  Lindley  (1981).  Indeed  the 
multivariate  problem   separates   into  K  univariate  ones. 

5.2   The  Log-normal   and  Log-student  Superpopulation 

In  order  to  estimate  the  superpopulation  parameters  <£,£)  or  (#.&)  in  the  normal  and 
student  cases  an  integrated  likelihood  must  be  formed,  analogous  to  that  for  the 
gamma  model.  There  is  no  way  of  avoiding  approximation  or  numerical  integration 
for  trial  parameter  values,  followed  by  a  search  of  some  sort  to  locate  the  maximum 
likelihood  estimates  (#,!.  )  in  the  normal  case,  or  (^,Z  ,k)  in  the  Student  case.  Such  a 
program  has  been  carried  out  and  tested  in  the  univariate  case,  both  on  simulated 
and  observational  data,  see  Gaver(l985).  The  integration  was  conducted  by  taking  a 
preliminary  Laplace  method  (quadratic  approximation  to  the  log-posterior)  approach, 
with  correction  furnished  by  Gauss-Hermite  integration.  Such  a  procedure  appears 
fairly  satisfactory  even  for  the  Student  superpopulation,  although  the  latter 
sometimes  admits  two  posterior  modes.  Such  a  program  becomes  far  more 
ambitious  in  the  multivariate  situation  addressed  in  this  paper,  and  further 
approximations  may  well  be  required  in  order  to  reduce  computing.  Of  course,  some 
assessment  of  the  sampling  errors  associated  with  superpopulation  estimates  will 
also  oe  desirable.  It  is  likely  that  bootstrapping  will  be  useful,  and  some 
experiments  in  the  univariate  Poisson-log-Student  case  have  already  shown  its 
potential. 

6  Summary  and  Further   Remarks 

We  have  presented  here  an  enhanced  version  of  a  quite  general  familiar  and  useful 
stochastic  model.  The  enhancement  recognizes  between-version  variation  in  process 
parameters;  such  may  be  the  result  of  endogenous  influences.  We  have  then 
addressed  the  problem  of  process  parameter  estimation  by  characterizing  the  between 
variability  component  with  the  aid  of  parametric  superpopulations.  In  particular,  it 
has  been  shown  that  the  familiar  linear  shrinkage  effects  often  encountered  in  Bayes 
analyses  using  priors  of  conjugate  form  are  interestingly  modulated  when  longer- 
tailed,   discrepancv-toierant   priors  or   superpopulations   are   introduced. 
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There  are  many  directions  for  future  work,  some  of  which  are  currently  being  taken. 
Models  that  permit  the  between  component  of  variability  to  arise  from  a  multivariate 
stochastic  process,  e.g.,  multivariate  log-Gaussian  process  rates,  are  attractive  when 
endogenous  influences  may  occur  in  a  time-series-like  manner,  as  may  be  true  of 
weather  or  economic  effects.  Hyperparameter  estimation  and  model  diagnosis  again 
present  problems.  Such  problems  promise  to  require  the  computer-intensive  activity 
that  characterizes  much  of  modern  statistics  and  operations  research.  It  is  our  hope 
that  the  results  will,  in  spirit,  resemble  the  various  interdisciplinary  statistical  efforts 
of  Herbert   Solomon,  to  whom  this  paper   is  dedicated. 
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